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Capacity Region of Finite State 
Multiple-Access Channel with Delayed State 
Information at the Transmitters 

Uria Basher, Avihay Shirazi, and Haim Permuter 
Abstract 

A single-letter characterization is provided for the capacity region of finite-state multiple access channels. The 
channel state is a Markov process, the transmitters have access to delayed state information, and channel state 
information is available at the receiver. The delays of the channel state information are assumed to be asymmetric 
at the transmitters. We apply the result to obtain the capacity region for a finite-state Gaussian MAC, and for a 
finite-state multiple-access fading channel. We derive power control strategies that maximize the capacity region for 
these channels. 

Index Terms 

Capacity region. Delayed feedback. Directed information. Finite-state channel, Gaussian Multiple-Access channel, 
Multiple-Access channel. Multiplexing coding scheme. Successive decoding. 

I. INTRODUCTION 

Wireless communication is an example of channels where the channel characteristics are time-varying. In a 
wireless setting, the user's motion and the changes in the environment, as well as the interference, may lead to 
temporal changes in the channel quality. Such channel variation models can include fast fading due to multi-path 
and slow fading due to shadowing. In fast fading, the channel state is assumed to be changing for every channel 
use, while in slow fading, the channel is assumed to be constant for each finite block length. 

In such communication problems, the channel state information (CSI) can be transmitted to the transmitters either 
explicitly, or through output CSI feedback. Frequently, the CSI feedback is not instantaneous; the transmitters have 
only delayed information regarding the state of the channel. The availability of the delayed CSI at the transmitters 
will possibly increase the capacity region. The increase in the capacity region due to CSI depends on the CSI delays 
relative to the rate at which the channel is time-varying. When a channel is slowly time-varying and the delays are 
small, CSI may significantly increase the capacity region. However, if the channel is changing rapidly relative to 
the CSI delays, the transmitters can no longer adapt to the channel variations. Hence, availability of delayed CSI 
may not result in any significant capacity region improvement. Therefore, we are motivated to study the effect of 
channel memory and delays on the multiple access channel (MAC) capacity region. 
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Let us now present a brief literature review. We are modeling a time-varying channel as a finite-state Markov 
channel (FSMC) |[T|, IJJ- The FSMC is a channel with a finite number of states. During each symbol transmission, 
the channel's state is fixed. The channel transition probability function is determined by the channel state. The time 
variation in the channel characteristics is modeled by the statistics of the underlying state process. 

Capacity of memoryless channels, with different cases of state information being available in a causal or non 
causal manner at the transmitter and at the receiver, has been studied by Shannon JS) and by Gelfand and Pinsker 
im. In (|5|, Goldsmith and Varaiya consider the fading channels with perfect CSI at the transmitter and at the 
receiver. They proved that with instantaneous and perfect state information, the transmitter can adapt the data 
rates for each channel state to maximize the average transmission rate. Viswanathan |6| loosened this assumption 
of perfect instantaneous CSI, and gave a single letter characterization of the capacity of Markov channels with 
delayed CSI. Caire and Shamai Q consider the case that the channel state is identically distributed (i.i.d.), and 
the CSI at the transmitter is a deterministic function of the CSI at the receiver. They showed that optimal coding 
is particularly simple. Chen and Berger in JS) found the capacity of an FSC with inter-symbol interference (ISI), 
where current CSI is available at the transmitter and the receiver. For a comprehensive survey on channel coding 
with state information see ||9l . 

The MAC with state has received much attention in recent years due to its importance in wireless communication 
systems. On the one hand, complete knowledge of the CSI at the transmitters is an unrealistic assumption in wireless 
communications. On the other hand, it is reasonable to assume that the receiver does possess full knowledge of 
the CSI. This practical consideration has motivated the investigation of a MAC where each transmitter is informed 
with its own CSI, while the receiver is informed with the full CSI. 

Our work is also related to ifTOl . ifTTIl . and lfT2l . In lITOll Como and Yiiksel found the capacity region of FS- 
MAC, where the channel state process is i.i.d., the transmitters have access to partial (quantized) CSI, and complete 
CSI is available at the receiver. In ifTTI the capacity of general FS-MAC with varying degrees of causal CSI at 
the transmitters is characterized in non-single-letter formulas. In 1121 the capacity region of the FS-MAC with 
feedback that may be an arbitrary time-invariant function of the channel output has been derived. Recent related 
work also includes |13|, which studies the state-dependent MAC with causal and strictly causal side information 
at the transmitters. 

In this work, we consider the capacity region of a finite state Markov Multiple-access channel (FSM-MAC) with 
CSI at the decoder (receiver) and delayed CSI at the encoders (transmitters) with delays di and d2 as illustrated 
in Fig. [T] The channel probability function at each time instant depends on the state of an underlying finite-state 
Markov process. The decoder, in addition to the channel output, also receives the channel state at each time instant 
(perfect CSI). The channel state is fed back to the encoders through a noiseless feedback channel. CSI from the 
decoder is received at Encoder 1 and Encoder 2 after time delays of di and d2 symbol durations, respectively. Each 
encoder, at each time instant, chooses the channel input based on the message to be transmitted and the CSI that 
he possesses. A formal description of the system model is presented in Section [III The main result of this paper is 
a computable characterization of the capacity region for this channel model. 
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Fig. 1: FSM-MAC with CSI at the decoder and delayed CSI at the encoders with delays di and d2- The state 
process has memory and is assumed to be FSM. The CSI is fed back to the encoders through a noiseless feedback 
channel. CSI from the decoder is received at Encoder 1 and Encoder 2 after time delays of di and ^2 symbol 
durations, respectively. We are considering the above problem setting in the cases where di > d2, di — d2, and 

^2 < C?i = OO. 



The remainder of the paper is organized as follows: In Section [III we concretely describe the communication 
model. In Section |III1 we state our main results, which are the capacity regions for different cases of time delays. 
Section ITVl provides the upper bound on the capacity region of FSM-MAC with CSI at the decoder and asymmetrical 
delayed CSI at the encoders. In Section [Vl we complete the proof of the capacity region, by providing the proof 
of the achievability. In Section IVII we provide alternative proof for capacity region. The alternative proof is based 
on a multi-letter expression for the capacity region of FS-MAC with time-invariant feedback ifTSll . In Section WUl 
we apply the general results of Section HU] to obtain the capacity region for a finite-state Gaussian MAC, and 
for a finite-state multiple-access fading channel. We derive optimization problems on the power allocation that 
maximize the capacity region for these channels. This power allocation would be the optimal power control policy 
for maximizing throughput in the presence of delayed CSI. We conclude in Section IVIIII with a summary of this 
work. 

II. CHANNEL MODEL AND NOTATION 

A. Channel Model 

In this paper, we consider the communication system of FSM-MAC with CSI at the decoder and delayed CSI at 
the encoders with delays di and c?2, respectively, as illustrated in Fig.[T] The MAC setting consists of two senders 
and one receiver. Each sender j G {1,2} chooses an index nij uniformly from the set {l, 2"^^ } and independent 
of the other sender The input to the channel from encoder j G {1, 2} is denoted by {Xj i, 2, -'^j.s, •■•}> and the 
output of the channel is denoted by {11,1^2,^3, •••}■ We use the notation to denote the sequence (Vi, Vn), 
therefore, X", denote the sequences ...,Xj „}, {Yi, ...,!"„}, respectively. A finite-state Markov channel 

is, at each time instant, in one of a finite number of states S — {si, S2, Sfc}. In each state, the channel is a DMC 
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with inputs alphabet Xi , X2 and output alphabet y. Let the random variables Si ,Si-d denote the channel state at 
times i and i — d, respectively. Similarly, denote by Xi^i,X2,i, and Yi the inputs and the output of the channel 
at time i. The channel transition probability function at time i depends on the state Si, and the inputs Xi^i,X2,i 
at time i, and is given by P{yi\xi^i,X2,i, Si). The channel output at any time i is assumed to depend only on the 
channel inputs and state at time i. Hence 

P{yi\x\,xl,s\) = P{yi\xi^i,X2,i,Si). (1) 

The state process {Si} is assumed to be an irreducible, aperiodic, finite-state homogeneous Markov chain and hence 
is ergodic. The state process is independent of the channel inputs and output when conditioned on the previous 
states, i.e., 

P{si\s'-\x\-\xi-\y'-')=P{si\si-i). (2) 
Furthermore, we assume that the state process is independent of Mi and M2, 

a 

P(s",mi,m2) = P(s")P(mi)P(m2) = [] P(si|si_i)P(mi)P(m2). (3) 

i=l 

Now, let K be the one step state transition probability matrix of the Markov process, and let tt be the steady 
state probability distribution of the Markov process. The {Si, Si-d) joint distribution is stationary and is given by 

■^d{Si = si, Si-d = Sj) = ■K{sj)K'^{su Sj), (4) 

where K'^{si, sj) is the {l, j)th element of the d-step transition probability matrix K"^ of the Markov state process. 
For simphcity, let us define S, Si as the variables that have the same joint distribution as {Si, Si-di), i.e., 

P{S = si,Si = Sj) = TTdASi = si,Si-di = Sj) = '!r{sj)K'^^{si,Sj). (5) 

Similarly, we define S, S2 as the variables that have the same joint distribution as {Si, Si-d^)- 

B. Code Description 

An (n, 2"^i, 2"^^ rfi, (^2) code for FSM-MAC with CSI at the decoder and delayed CSI at the encoders with 
delay di and d2 consists of 

1) Two sets of integers Mi = {1, 2, 2"^i} and M2 = {1, 2, 2"^^^}, called the message sets. 

2) For each encoder, an encoding function fj, j £ {1, 2}, maps the set of messages to channel input words of 
block length n. Each fj works through a sequence of functions fj^i that depend only on the message Mj and 
the channel states up to time i — dj. For encoder 1 (j = 1): 

[ /i,((M,,S'-*), A + l<!<n J 
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Similarly for encoder 2 (j = 2): 

3) A decoding function that maps a received sequence of n channel outputs and channel states to the messages 
set 

i; -.Y" X S" Mix M2. (8) 
We define the average probability of error for the (n,2"^^2"^^dl,(^2) code as follows: 

n^"^ - 2„(i+i^.,) E E^^^"(^")P^{^(2/"'^")^("^i'™2)l(mi,m2)wassent}. (9) 

mi, 7712 

We use standard definitions [14| of achievability and capacity region, namely, a pair rate {Ri,R2) is achievable 
for FSM-MAC with CSI at the decoder and delayed CSI at the encoders with delays di and d2, if there exists a 
sequence of (n, 2"^^ , 2"^^, di, ^2) codes with Pi"-* ^> as n goes to infinity. The capacity region is the closure 
of the set of achievable (i?i,i?2) rate pairs. 

III. MAIN RESULTS 

Here we present the main results of this paper. Recall, that the joint distributions of (S, Si), and (S, S2) is given 
in (|5]l. Without loss of generality, let us assume that di > d2. 

Theorem 1 (Capacity region of FSM-MAC with delayed CSI di > d2) 

The capacity region of FSM-MAC with CSI at the decoder and asymmetrical delayed CSI at the encoders with 
delays di and d2 as showen in Fig. Q] is given by: 

' Ri<I{Xi;Y\X2,S,SuS2,U),^ 

R2<I{X2;Y\Xi,S,SuS2,U), , (10) 
^ Ri+R2<I{Xi,X2;Y\S,SuS2,U), j 

where U is an auxiliary random variable with cardinality \U\ < 3. 



n= U 

P(u\sx)P(xi\si ,u)P(x2\si ,S2 ,n) 



The proof of Theorem [T] is presented in Sections |IV] and|V] In Section |IV] we prove the upper bound of the capacity 
region, and Section |V] is devoted to the proof of the achievability. The proof of the achievability is based on a 
multiplexing coding scheme, and successive decoding. In addition, we provide alternative proof of Theorem [T] in 
Section |Vl] The proof for the cardinality bound of U is presented in Appendix |A] 

Now, directly from Theorem [T] we can derive the capacity region in the case of di = d2- Since di — d2 we have 
Si = S2, hence we denote S — Si = 82- Using Theorem [T] we get. 



Theorem 2 (Capacity region of FSM-MAC with symmetrical delayed CSI di — d2) 

The capacity region of FSM-MAC with CSI at the decoder and symmetrical delayed CSI at the encoders with delay 
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n= U 

P{u\s)P{xi\s,u)P{x2\s,il) 



U 

P[q)P{xi\q)P{x2\s,q) 



d is given by: 

( i?i </(Xi;y|X2,5,^,{/), ^ 

i?2 < /(X2;r|Xi,5,^,C/), , (11) 
y Ri+R2< I(Xi,X2;Y\S,S,U), j 

where U is an auxiliary random variable with cardinality \IA\ < 3. 

Now we consider the case that encoder 1 does not have state information at all, i.e., di ~ oo. 
Theorem 3 ( Capacity region of FSM-MAC with delayed CSI only to one encoder) 

The capacity region of FSM-MAC with CSI at the decoder and delayed CSI only to one encoder is given by : 

' Ri<I{Xi-Y\X2,S,S,Q), ^ 

i?2 </(X2;r|Xi,5,5,Q), , (12) 
^ Ri+R2<I{Xi,X2;Y\S,S,Q), j 

where Q is an auxiliary random variable with cardinality \Q\ < 3. 

The proof of Theorem [3] is quite similar to the proof of Theorem [1] the details are presented in Appendix |B] 

IV. CONVERSE 

In this section we provide the upper bound on the capacity region of MAC with receiver CSI and asymmetrical 
delayed CSI feedback, i.e., we give the converse proof for Theorem [T| Without loss of generaUty let us assume 
that di > d2- 

Proof Given an achievable rate (_Ri,i?2) we need to show that there exists joint distribution of the form 

P{s,si,S2)P{u\si)P{xi\si,u)P[x2\si,S2-,u)P(y\xi,X2Ts) such that, 

Ri<I{Xr,Y\X2.S,~Si,S2,U), 
R2<I{X2;Y\Xi,S, Si,S2,U), 
R1+R2 <I{Xi,X2;Y\S,Si,S2,U), 

where U is an auxiliary random variable with cardinality \U\ < 3. The proof for the cardinality bound is presented in 

Appendix |A] Since , i?2 ) is an achievable pair-rate, there exists a code {n, 2"^^ , 2"^^ ,di,d2) with a probability 

,(") 



of error Pe arbitrarily small. By Fano's inequality, 

H{Mi,M2\Y'^, S") < n{Ri + i?2)Pi"' + -^(-Pi"^) - ne,, 



and it is clear that e„ 



(n) 

as Pe ^00. Then we have 

i?(Mi|r",S'") < iJ(Mi,M2|r",5") <e„, 
H{M2\Y", S*") < H{Mi,M2\Y", S*") < e„. 



(13) 

(14) 
(15) 
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We can now bound the rate i?i as 

nRi = H{Mi) 

= if(Mi) +i?(Mi|y",6'") -iI(Mi|y",5") 
< /(Mi;y",S")+n£„ 



(J 
(J 



/(Mi;F"|5")+/(Mi;5") + n£„ 

/(Mi;r"|5")Hrne„ 

7(Xi";y"|5")+n£„ 



= I{X^;Y^\X^,S")+nen 

= iJ(y"|X^, 5") - if (y"|Xi", X2", 5") + ne„ 

n 

= ^ij(y,|y*-\x^\5") -iJ(y,|y'-i,Xi",x^\5") + n£„ 

< Yl H{Yi\X2,i, Si, Si-a,,Si-d„S'-'^'-^) - H{Yi\Y'-\X^, X^, 5") + n£„ 

i=l 

= -ff (5^1^2,1) iSi, ^i-dj^ , 5' ^) — H{Yi\Xi^i, X^^i, Si, Si-d2, Si-di, SI '^^ ^)+ns„ 

i=l 
n 

= 'Y^I{Yi]Xi^i\X2^i,Si,Si-d2,Si-(ii,Sl ''^ "'^)+n£„, 

i=l 

where 

(a) follows from Fano's inequahty. 

(b) follows from chain rule. 

(c) follows from the fact that Mi and S*" are independent. 

(d) follows from the fact that X^ is a deterministic function of (Mi, 5") and the Markov chain (Mi, 5") — 
(Xi",^") - F". 

(e) follows from the fact that X" and M2 are independent, and the fact that is a deterministic function of 
(M2, S*"). Therefore, X" and X2 are independent given 5" . 

(f) and (g) follow from the fact that conditioning reduces entropy. 

(h) follows from the fact that the channel output at time i depends only on the state Si and the the inputs Xi^i and 



(17) 



Hence, we have 

1 " 

-Rl < — -^l,i|-^2,i, 'Si, 5'i-d2) 'S'i-di J 'S'l ''^ ^) + £n- (16) 

Similarly, we have 

1 " 

1=1 

To bound the sum of the rates, consider 
n{Ri+R2) = H{Mi,M2) 

< 7(Mi,M2;r",5") + n£„ 



(6) 



/(Ml, M2; + /(Ml, M2; 5") + n£„ 

/(Mi,M2;F"|5")+ne„ 

I{X^,X^;Y''\S")+nSn 

/f(y"|S'") - ff(y"|Xi",X2",s'") 

n 



(/) 



^ //(FilF^-i, 5") - i/(yi|Xi,i, X2,i, Si, Si-d,,Si-d, , 5^-^1-1) + ne„ 

i=l 

n 

< ^-ff(5^i|'S'i,S'i-d2)'S'i-di,'S'*~'''~"^) — H{Yi\Xi^i, X2,i, Si, Si-d^, Si-d^, S^'"^^'^) +nen 

i=l 

n 

i=l 

where 

(a) follows from Fano's inequahty. 

(b) follows from chain rule. 

(c) follows from the fact that Mi, M2, and 5" are independent. 

(d) follows from the fact that X^,X^ is a deterministic function of (Mi,M2,5") and the Markov chain 

(Ml, M2, 5") - (Xf, X^, 5") - F". 

(e) follows from the fact that the channel output at time i depends only on the state Si, and the inputs Xi^i, and 

^2,1- 

(f) follows from the fact that conditioning reduces entropy. 
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Hence, we have 



1 " 

i?i + i?2 < — > I(Yi; Xi,i, X2,i\Si, Si-d2T Si-di, ^ ) + Sn- (18) 

n ^ — ^ 



n 

i=l 



The expressions in ( fTSI l. iTl\ . and ( fTSl l are the average of the mutual informations calculated at the empirical 
distribution in column i of the codebook. We can rewrite these equations with the new variable Q, where Q — i ^ 
{1, 2, n} with probability i. The equations become 

1 " 
n ^ — ' 

1 " 

= - E /(>q; ^i,qI^2,q, Sq, SQ-d, , ^Q-di , s'^-'^-Kq = z) + e„ 

= /(yQ;Xi,Q|X2.Q,5Q,5Q_d„5Q_d,,5«-'*i-\g) + £„ (19) 



Now let us denote Xi ^ Xi,q, X2 ^ Xs^g, F = ^q, 5 = ^q, ^1 = ^g-d^, ^2 = and U ^ {SQ-''^-\,Q). 

We have, 

Ri < IiXi;Y\X2,S,Si,S2,U) + en, 

R2 < I{X2]Y\Xi,S,Si,S2,U)+Sn, 

R1+R2 < IiXi,X2;Y\S,Si,S2,U) + en. 
To complete the converse proof we need to show the following Markov relations hold: 

1) Piu\s,Si,S2) = Piu\h) . 

2) P{xi\s,si,S2,u) ^ P{xi\si,u). 

3) P(X2\XI,S,SI,S2,U) = P{X2\SI,S2,U). 

4) P(y|a;i,X2,s, si,S2,u) = P{y\xi,X2.,s). 
We prove the above using the following claims: 

1) follows from the fact that S'-'^^-^ - S^-d, - S^-d,_ - 5,; and so is (S^^'^''^,Q) - Sg^d, - Sq-d^ - Sq. 

2) follows from the fact that Xi^i = /i ^(A/i, S''^'^^) and that Mi and S*" are independent. Hence 

P{xi^q\Sq, Sq-d^, Sq-d^, S\^'^^^^ ,q = l) = P {xi^q\s q-d^, s\^'^'^^ , q ^ l). 

Since this is true for all i, 

P{xi^q\Sq, Sq-d,,Sq-d2,Sr'^^^^,q) = P{xi^q\Sq-d, , sf^'^^^'^ , q) . 

Therefore we have, 

P{xi\s,Si,S2,u) = P{xi\si,u). 

3) We assume that di > d2, since M2 and {Mi,S") are independent, and the state process is Markov chain. 
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we have 

P(m2, s'-''^ |s„ Si-d, , Si-d2 , s'~'^^ , mi) = P(m2, s''"^^ \s^-di , , s'"'^' )■ 

Therefore, we have the Mai-kov chain (M2, S'^'^^) - (S'.^di, ^^^d^, - (Mi, Si, S'-'^^). Since Xi,^ = 

fi,iiMi, S^""^^) and ^2^^ = /2,i(-^^2j 5'*^'*^) where /i,i,/2,i are deterministic functions, we obtain the 
following Markov chain, 

X2,« - (M2,5'-'*^) - {S,-d„S,-d,,S'-''') - (Mi,5„^^-'*0 -^1,.- (20) 

Which implies. 

Since this is true for all i, 

P{x2,q\xi^q, Sq, Sq^du Sq—d2J ^1 , q) — P {x2.q\Sq—di j Sq — d2 ^ ^1 , q) . 

Therefore we have P{x2\xi, s, si, S2, u) = P(a;2|si, S2, 
4) follows from the fact that the channel output at any time i is assumed to depend only on the channel inputs 
and state at time i. 

in) 

Hence, taking the limit as n ^ cx), Pe ' 0, we have the following converse: 

Pi < I{Xr,Y\X2,S,Si,S2,U), 
R2 < I{X2;Y\Xi,S,Si,S2,U), 
P1+P2 < I{XuX2;Y\S,Si,S2,U), 

for some choice of joint distribution P{s,si,S2)P{u\si)P{xi\si,u)P{x2\si,S2,u)P{y\xi,X2,s) and for some 
choice of auxiliary random variable U defined on \U\ < 3. This completes the proof of the converse. ■ 

V. PROOF OF THE ACHIEVABILITY OF THEOREM [U 

In the previous section we proved the converse of the capacity region of Theorem [1] In this section we prove 
the achievability part. The main idea of the proof is using multiplexing coding, i.e., multiplexing the input of 
the channel at each encoder (the multiplexer is controlled by the delayed CSI), then, using the CSI known at the 
decoder, demultiplexing the output at the decoder. 

Proof: To prove the achievability of the capacity region, we need to show that for a fix P(a;i|si)P(a;2|si, S2) 
and (Pi , P2) that satisfy 

Pi < I{Xi;Y\X2,S, 81,82), 
P2 < IiX2;Y\Xi,8,Si,S2), 
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i?i+i?2 < IiXi,X2;Y\S,Si,S2), 

there exists a sequence of (r^,2"'"^2"■«^(^l,d2) codes where p]"^ ^ as n oo. Without loss of generality, 
we assume that the finite-state space S — {1, 2, fc}, and that the steady state probability tt{1) > for all / G S. 

Encoder 1: construct k codebooks CJ^ (where the subscript is for Encoder 1) for all Si E S, when in each 
codebook C^^ there are 2"i('*i)^^(*i) codewords, where ni(si) = {P{Si = si) — e')n, for e' > 0. Every codeword 
(i) when i e {1, 2, 2"i(*i)-'^i('*i)} has a length of ni(si) symbols. Each codeword from the C^^ codebook is 
built X^^ ~ i.i.d. P{xl'-\Si — si) (where the subscript is for Encoder 1). A message Mi is chosen according to 
a uniform distribution Pr(Afi = mi) = 2~"-^i, mi e {l, 2, 2"^^}. Every message mi is mapped into k sub 
messages Vi(mi) = {Vi (nii) , Vi (nii) , . . . , Vi (mi)} (one message from each codecook). Hence, every message 
mi is specified by a fc dimensional vector. For a fix block length n, let Ng-^ be the number of times during the n 
symbols for which the feedback information at encoder 1 regarding the channel state is Si = si. Every time that 
the delayed CSI is ^i — si, encoder 1 sends the next symbol from the CJ^ codebook. Since Ng-^ is not necessarily 
equivalent to ni(si), an error is declared if Ns-^ < rii(si), and the code is zero-filled if Ng-^ > ni(si). Therefore, 
we can send a total of 2"^^ = 2^''i^^ ni(si)fli(si) jj^g^^^gg^^ 

Encoder 2: construct fc x fc codebooks C2^'*^ (where the subscript is for Encoder 2) for all (si, S2) € {S x S}, 
when in each codebook Cj^''*^ there are 2"2'^''i'''2)JJ2(si,s2) codewords, where ^2(31,52) = iPiSi,S2 = 
si,S2) - e')n, for e' > 0. Every codeword C^''"''{i) when i G {1^2, ...,2'^^^'^-^^-^^^'''^-'^'^} has a length of 
"-2(si,S2) symbols. Each codeword from the Cj^'''^ codebook is built X2^'''^ ~ i.i.d. P(a;2^'*^ |(S'i, 6*2) = 
(si,S2)) (where the subscript is for Encoder 2). A message M2 is chosen according to a uniform distribution 
Pr(Af2 = 1TI2) = 2^"^^, m2 G {l,2, ...,2^^'-'Y Every message m2 is mapped into k x k sub messages 
V2(m2) = I V2^'^(mi), V2^'^(m2), V2'^''^(m2)| (one message from each codecook). Hence, every message m2 
is specified by a fc x fc dimensional vector For a fix block length n, let A^Si.Sg be the number of times during 
the n symbols for which the feedback information at encoder 2 regarding the channel state is {Si, S2) = (si, S2). 
Every time that the delayed CSI is (5*1, S'2) = (si, S2), encoder 2 sends the next symbol from the €2'''"^ codebook. 
Since Ng-^^g,^ is not necessarily equivalent to ^2(51, §2), an error is declared if Ng-^^g^ < "-2(51, S2), and the code is 
zero-filled if Ng.^s^ > ri2(si, S2). Therefore, we can send a total of 2"'^^ = 2^^^i.='2esxs «2(Si,S2)ii2(5i,s2) messages. 

Decoding : We use successive decoding; in this method, instead of decoding the two messages simultaneously, 
the decoder first decodes one of the messages by itself, where the other user's message is considered as noise. After 
decoding the first user's message, the decoder turns to decode the second message. When decoding the second 
message, the decoder uses the information about the first message as side information. This decoding rule aims to 
achieve the two corner points of the rate region, i.e., (i?i — I{Xi;Y\X2, S, 81,82)— e, R2 = I{X2;Y\S, 81,82)— t), 
and (i?i = I{Xi;Y\8, Si, 82) - e, i?2 = I{X2;Y\Xi, 8, 81,82) - e). The rate region is illustrated in Fig. |2] 

To achieve the first point, let us analyze the case where the decoder first decodes X^. The information 
Si , 82 used to multiplex the codewords at the encoder is also available at the decoder. Hence, upon receiving 
a block of channel outputs and states (F",5"), the decoder first demultiplexes it into outputs corresponding 
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^2 t 



I{X2;Y\XuS,Si,S2) ^ 



I(X2;Y\S, SuS2) 




-|- Ri 
I{Xi;Y\S,Si,S2) I{Xi-Y\X2,S,Si,S2) 



Fig. 2: The rate region 



to the component codebooks of encoder 2. Then, the decoder separately decodes each component codeword 
y/^'"" where iSi,^) e S x S. For each codebook C~^''~'\ the decoder has (r"2(Si,S2) 5'«2(5i,s2)) ^nd 
searches {Xl'^^^'''^^) such that ^"^("i'"^)) are strongly jointly typical sequences [14J, i.e., 

(^n2(5i,S2)^y„2(5i,S2)^5'n2(Si.s2) £ ^^("'('^^'"'^(Xa,^,^)) given (S'1,^2) = (si,i2). The decoder declares that 
7712 is sent if it is a unique message such that (X^^'-"''"^\m2),Y"''''^^^'~'^-\ S"^^^^-'^^ £ A*^"'^"''^~'^^(X2, F, 5)) 
given {Si, S2) ~ (si, §2) for all si,S2 £ S x S, otherwise it declares an error. If such 77^2 is found, the decoder has 
X2{m2), but now the decoder is using the information 5*1 to demultiplex (F",^") into outputs corresponding to 
the component codebooks of encoder 1 (which have k codebooks). The decoder declares that 7771 is sent if it is a 
unique message such that {X]'''^"'\mi), X^''-"'\m2),Y'''^^^^\ S''^^^^'> G A*J'''''^''^\Xi, X2,Y, S)) given Si = h 
for all si G S, otherwise it declares error. 

Analysis of the probability of error: First, we analyze the probability of error for the component codeword Vj"^'*^ 
at encoder 2, i.e., Pt{Ns-^^s2 < ^12(51, S2)). Since that the state process is stationary and ergodic lim„^oo ^^''^^^'''^'^ — 
P{si,S2) in probability. Therefore, Pr(A^si,s2 < "-2(51,52)) — s> as n ^ 00. Now, we analyze the probability 
to decode incorrectly the component codeword V^^''^^ that was sent from the codebook of encoder 2. 

Without loss of generality, we can assume that the first codeword was sent from the Ca^'*^ codebook of encoder 
2, which we denote by C2^'*^(l). Since 5'"^^'^^''*^^ is ergodic and by using the L.L.N, as 772(51, S2) — ^ 00 we have 
Pr e Ae^"'^"''^~'^'(S')| ^ 1. By the construction of the codebook C^''"" (1), X2 and S are independent 

given (Si, S2) — (si, S2). Hence ^^^'•''^'''^■'(l) and are strongly jointly typical sequences with probability 



1. Finally from the codebooks construction and the channel transition probabiUty we have that, 

P{yi\X2,s\Si,S2) = ^ p{Xl,i\X2,s\si,S2)p{yi\Xl,i,X2,s\Si,S2) 

= ^ p{xi^i\si,S2)p{yi\xi,i,X2,i,Si,Si,S2) 

a;i,ieXi,i 

= P{yi\X2,i,Si,Si,S2)- (21) 

Now using the fact that p{yi\xl,s^,s 1,82) = p{yi\x2,i,Si, 81,82), and the L.L.N, we have 
Pr{x2"='(^~^'^~^^(l),5"=(^"i'^"^),y"=W e^:^"^^'^''^»(X2,F,5)|(^i,52) = (Si,S2)} ^ 1 as n2{h,h) ^ 00. A 
decoding error occurs only if 

El = |(^xj2(*~i'*~2)(i),y»2(«i,S2)^5n2(Si,S2)j ^^*("^(*~i>^~^))(X2,y,S')|(5i,52) = (§i,§2)}, (22) 
E2 = {3i ^ 1 : (^X^^^'"''^ ^y"2(Si,52)^5n2(Si,S2)^ g y4*("^(«"i'^'^»(X2, F, ^2) = (Si,S2)} -(23) 

Then by the union of events bound, 

pJn2(Si,S2)) ^ Pv{EiUE2) 

< P{Ei) + P{E2). (24) 

Now let us find the probabihty of each event, 

1) P {El}- As mentioned above as n2(si, S2) — > 00 we have, 

P{Ei)^0. 

2) P {E2)- for i ^ 1 the probability of error, 

p{E2) = Pr((x2"^(^'^'''^'(i),yr^'"''\5r^'"''') e^:("^(^"i'^'^»i(5i,52) = (si,s2)) 

i=2 

2"2(si,S2)-R2(si,S2) . 2~"2(Si, S2)(j'(^2;'i',S'|S'i=Si, 52=82) — e) (25) 

For P {E2) ^ as n2(si, S2) — > 00, we need to choose, 

-R2(si,S2) < I{X2;Y,S\Si ^ 81,82 ^ h) - 

= /(X2; ^1 = Si, ^2 = S2)) + I{X2; S\Si = §2, S2 = S2)-e 

I{X2;XuY\S,Si = h,S2 = S2)-e, (26) 

where (a) follows from the independence of X2 and S given (^i = 81,82 = 82). 
Similarly, we can analyze the probability of error to the rest of the codebooks of encoder 2, i.e., €2^'^'^ for every 
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(si, S2) € {<S X S}. Therefore, as n — >^ oo 

-R2 < > ^ ^-R2(Sl,S2) 

Sl,S2 

< V^^^^(/(X2;r|5,5i=5i,^2=S2)-e) 

= J2 (^(^i'^2) - e'){I{X2;Y\S,Si = si,52 = S2) - e) 

S1,S2 

= I{X2;Y\S,Si,S2)-e", (27) 

where e" = e + e' Es^.s, ^(^2; 5i = Si, S2 = S2) - ee'. 

Let us analyze the probabiHty of error for the component codeword . As mention above, since that the state 
process is stationary and ergodic lim„_j.oo ^^--^ = -P(si) in probability. Therefore, the probability that an error is 
declared at encoder 1, Pr(A^5j < ni(si)) — > as n — > oc. Now, we analyze the probability to decode incorrectly 
the component codeword Vf'^ that was sent from the C'l^ codebook of encoder 1 after 1/2 was decoded correctly. 
Without loss of generality, we can assume that the first codeword was sent from the C'l^ codebook of encoder 1, i.e., 
€'2^ (1) was sent. Again from the ergodicity of 5"^*^*^^ the construction of the codebooks, and channel transition prob- 
ability we have that Pr [{Xl'''^'''> (1) , X^^'^''\M2),Y"'^'''> , 3^'^'^^) G A*e^'''^''^\Xi, X2,Y, S)\Si = §1} ^ 1 
as ni(si) — > cxD. A decoding error occurs only if 

^3 = {(x"^("'^)(l),X2'^''^(M2),y"^("'^\5"i(^'^)) ^ (28) 
Ei = {3i 7^ 1 : (i),X2"^<"'^^(M2),r"i^«"^\5"^(^~^^) G A*("i(«"i))(Xi,X2,F,5)|Si = §1} . (29) 

Then by the union of events bound, 

pJm(Si)) = Pr(^;3U£;4) 

< P{Es) + P{E4). (30) 

Now let us find the probability of each event, 

1) P (S3)- As mentioned above as ni(si) — >^ 00 we have, 

P {E3) ^ 0. 

2) P {E4)- for i 7^ 1 the probability of error, 

P{Ei) = Pr((xf^('^^)(i),X2"''"~^)(M2),F"i("~i\5'"i("^i)) € 4"'^"'^^ l-^i = Si) 

2»ii(Si)Ri(i) 
i=2 

^ 2ni{Si)Ri{si) . 2-"i(si)(-f(^i;^2.'i^.'5|Si=si)-e) ^^l) 
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For P {E4) ^ as ni(l) — > 00, we need to choose , 



Riih) < I{Xi;X2,Y,S\Si^h)-e 

= I{Xi;Y\X2, S, Si = Si) + I{Xi; X2, S\Si = h) - e 

I{Xi;Y\X2,S,Si = h)-e, 
= H{Y\X2,S,Si^h)-HiY\Xi,X2,S,Si = si)-e, 

H{Y\X2, S, Si = h,S2) - H{Y\Xi,X2, S, Si = h, S2) - e, 
= I{Xi;Y\X2,S,Si = h,S2)-e, 



where (a) follows from the independence of Xi and {X2,S) given 5*1 = si, and (b) follows from the 
independence of Y and ^2 given {X2, S,Si = si). 

Similarly, we can analyze the probabiUty of error to the rest of the codbooks of encoder 1, i.e., C^^ for every 
si G {«S}. Therefore, as n — > 00 





Sl 



5^(P(si) - e'){I{Xi;Y\X2, S, S^ = h, §2 



I{Xi;Y\X2,S,Si,S2 



(32) 



where e" = e + e' Es^.s, I{Xv,Y\X2, S, 5i = si, ^2) - ee'. 



Thus the total average probabihty of decoding error as n — >■ cxD if i?i < I{Xi;Y\X2, S, 5i, 5*2), R2 < 

/(X2; Y\S, Sl, S2). The achievability of the other comer point follows by changing the decoding order. To show 
achievability of other points in 7?,(Xi,X2), we use time sharing between comer points and points on the axes. 
Thus, the probabihty of error, conditioned on a particular codeword being sent, goes to zero if the conditions of 
the following are met: 



Ri < I{Xi;Y\X2,S,Si,S2) 



R2 < I{X2;Y\Xi,S,Si,S2) 



R1+R2 < I{Xi,X2;Y\S,Si,S2). 



The above bound shows that the average probabihty of error, which by symmetry is equal to the probabihty for an 
individual pair of codewords {mi, 1712), averaged over aU choices of codebooks in the random code constraction, 
is arbitrarily small. Hence, there exists at least one code (n, 2"^^, 2"^^, rfi, ^2) with arbitrarily small probabihty 
of error. To complete the proof we use time-sharing to allow any {Ri,R2) in the convex hull to be achieved. ■ 



16 



VI. ALTERNATIVE PROOF 



In this section we provide an alternative proof for Theorem [T] The alternative proof is based on a multi-letter 
expression for the capacity region of FS-MAC with time-invariant feedback lfT2l . In order to use the capacity region 
of FS-MAC with time-invariant feedback, we treat the knowledge of the state at the encoders as being part of the 
feedback from the decoder to the encoders. 

Throughout this section we use the causal conditioning notation (H ). We denote the probability mass function 
(pmf) of y" causally conditioned on X^^^'^, for some integer d > 0, as P(j/"||a::"^'^) which is defined as 

n 

PivV^") = l[P{y^W-\x'~''), (33) 

i=l 

(if i — d < then x'"'' is set to null). The directed information I{X^ F") was defined by Massey in ifTSl as 

n 

/(X" ^ r") ^ ^/(x^;yi|r*-i). (34) 

1=1 

Directed information has been widely used in the characterization of capacity of point-to-point channels fS], lfT6l . 
ifTTl . ifTSl . |fT9l . II20I . compound channels II2TI . network capacity ll22l . rate distortion ll23l . Il24l . and broadcast 
channel ||25]| . Directed information can also be expressed in terms of causal conditioning as 

n 

1=1 

p(y"||X") 



E 



log- 



(35) 



p(r") 

where E denotes expectation. Directed information between X" to causally conditioned on X2 is defined as 

n 

i{x^ ^V'wx^) ^ ^/(xj;y,|y'-\x^) 

1=1 

p(r"||Xi",xj) 



E 



log- 



P{V\\X^) 



(36) 



where P(j/"||x?, x^) = nr=i ^(2/»l2/'"'' ^^l- 4)- 

Now let us present a result from |fT2l that we need for the proof. Consider the FS-MAC with time-invariant 
feedback as illustrated in Fig. [3] The channel is characterized by a conditional probability P{yi, Si+i\xi^i, X2,i, Si) 
that satisfies, 



P{yi,Si+i\x\,x!2,s\y' ) = P{yi,Si+i\xi^i,X2^t,Si). 



(37) 



In addition, we assume that the channel is stationary, indecomposable, and without ISI, i.e., 

P{yi,Si+i\xi^i,X2^t, Sj) = pisi+i\3i)p{yi\xi^i,X2,i,Si) 



(38) 
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and 



P{so) = 7r(so), 

where 7r(so) is the unique stationary distribution, i.e., lim„^oo Pr(S'„ — s\so) — 7r(so), Vsq G S. 



(39) 
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Vi 




A2(i/~) 



mi, m2 



Fig. 3: Channel with feedback, where the feedback is a time-invariant deterministic function of the output. 



Lemma 4 4721 Theorem 13] The capacity of a stationary, indecomposable FS-MAC without ISI and with time- 
invariant feedback, as illustrated in Fig. \3\ is TZ = lini„_i.oo Ti-n, where TZn is the following region in R^.- 



Tin = 



u 



(40) 



Ri<}-J(X'^^Y"\\X^), . 

y Ri + R2<^jiiXi,x2r^Y"). J 

In lfT2l Theorem 13] only the case where di ~ d2 = 1 was considered, but the result extends straightforwardly to 
any delay di and d2- The following theorem provides an alternative proof for Theorem [T] based on Lemma H] 



Theorem 5 Let us denote TZn ond TZ to be the following regions in M.^: 

' R\ < 7J{X^ ^ Y'',S"-\\X^), ^ 



u 

P{xf\\s"-'i}P{x^\\s^^ 



n= U 

P{u\si)P{xi\si ,U)P{X2\SI,S2,U) 



Ri < i/(X2" ^r",5"||xn, 

\Ri + R2< i/((Xi,X2)" J 

^ Ri<I{Xv,Y\X2,S,SuS2,U) ^ 

R2<HX2;Y\Xi,S,Si,S2,U) 
^ R,+R2<IiX,,X2;Y\S,Si,S2,U), j 



(41) 



(42) 



18 



The capacity region for the FSM-MAC with CSI at the decoder and asymmetrical delayed CSI at the encoders with 
delays di and ^2, <2S illustrated in Fig. [7] is lini„_).oo TZn = TZ. 



Tlfeedback — 1™ I I 

n->oo ^ 

P(xf ||z"-'*i)P(a;?||z"-''2) 



7^„ = U 

P(x5' I |s"-''i )P(3:J| |s"-''2 ) 



Proof: 

In order to adapt the model in Fig. [3] to our model, we can consider the state information at the decoder as a part 
of the channel's output. Therefore, the capacity region is 

^1 < ^i{X2 . (43) 

y i?l+i?2<i/((Xl,^2)"^i"",5"). y 

Now, by choosing the deterministic function of the output zi,i{yi,Si) ~ Z2,i{yi:Si) = si, ( l43T l yields the capacity 
region for the FSM-MAC with CSI at the decoder and asymmetrical delayed CSI at the encoders as shown in Fig. 
[T] Note that TZfeedback — linin^oo T^n, hence the capacity region is lim„^oo T^n- In order to complete the proof 
we need to show that lim„^oo Ti-n — Tl- First let us show that lim„^oo Tin 12 Tl, 

y Ri + R2<iliiX,,X2r^Y",S"). J 

U Ri<^^j:7=,I{X^;Y,,S,\Xl,Y^-\S'-^), ■ 

PKI|a"-0P(-5lk"-^) + R^<1.J2^^^ 1(^X1, X^;Y,,S,\Y^-\S^-'). j 

To bound Ri, consider 

n 

Ri < -Y.HXI;Y.,S.\XIY^-\S'-') 
i=i 

1 " 

= - V H{J,,S,\X\,Y^-\ S'-') - H{Y,,S,\XIX'2,Y'~\S'-') 
n ^-^ 

1=1 

1 " 

= -y^H{S,\XlY'-\S'-') + H{Y,\Xi,Y'-\S') 

i=l 

1 " 

— y2H{S,\Xl,XlY'-\S'^-') + H(Y,\Xl,X'2,Y'-\S') 

i—l 

n 

■Y,H{S^\S'-') + H{Y,\X'2,Y'~\S') - H{S,\S'-^) - H{Y,\X^,,,X2,^,S,) 
1=1 

1 " 

-y^H{Y\XlY'-\S') - H{Y\Xi,,,X2,.,S^). 



n 



Where (a) follows from the fact that the channel is without ISI, and from the fact that the channel's output at time 
i depends only on the state Si, and the inputs Xi,i, X2,i- We can bound R2 and Ri + i?2 in a similar way. Hence 
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u 



P(2;5'||s"-''i)P(xJ||s'»-''2) 



we obtain 

' Ri < }:J:^=,H{Y,\XIY-~\S') - H{Y,\X^,„X2^^,S,), ^ 

^ Ri+ R2 < ^E"=iH{Y,\Y'-\S') - H(Y,\Xi^,,X2,,,S,). ^ 
Now using I19] Lemma 3], we have that P(x"||s""''i)P(a;2 Ik""'^^) determines uniquely I P{xi_t\x\^^ , s^^'^^) 

> n 

Pix2,^\xl-\s'-'^^-)\ , hence, 

' Ri<^Y.l^^H{Y,\Xl,Y--\S')^H{Y,\X^,,,X2.,,S,), ^ 

R2 < iY:UH{Y^\Xl,Y--\S') ^ H{Y,\X,,,,X2,^,S,), 
^ Ri+R2<}iYJl=iH{Y,\Y'-\S')^H{Y,\X^^,,X2.„Si). j 

Let us assume that di > d2, furthermore, we restrict the inputs of the channel by assuming that 

P{xi,,\x\-\s'-^^) = F(xi,,|s,„dJ, Pix2,^\xl-\s'-''^) = ^(xa,, | , s,_d. ) • Therefore, 

^ ^1 < ^,E7=iH{Y^\XlY'-\S')~H{Y,\X,,„X2,^,S,), ^ 

R2 < ^^EtlH{Y^\Xl,Y'-\S')^H{Y,\X,,„X2,^,S^), ■ 
y Rl+R2<^Y.l=lH{Y,\Y'-\S'^)~H{Y,\Xl,,,X2,^,Si). j 

Since we assumed that P{xi^i\x\~^ , s^^'^^) — P{xi^i\si-di), we have the following equalities. 



7^„ = U 



Tin 3 U 

{P{xi,i\Si-di )P{x2,i\Si-di ,Si-d2)}"=l 



P{y,\xl,f-\s') 



J2 Pixi,.W2,y''\s')P{y,\xi,,,xlf'\s') 



(a) 



(44) 



where (a) follows from the fact that the channel's output at time i depends only on the state Si, and the inputs 

-^1,1, X2,i, and from the fact that P(xx^i\x\,-]f~'^ ^s") = P(xi,j|s*"'*i) = P(a;i,j|sj_di). From dHI we get 



HiY,\XlY'-\S' 



H (Yi\X2^i, Si, Si — di , Si^a 



Similarly, 



H{Y,\XIY'-\S') 
H{Y,\Y'-\S') 



H{Yi\Xi^i, Si, Si-di, Si-d2)- 
H{Yi\Si, Si-di , S'i-da)- 



Therefore, 



{P{xi,i\Si_di)P{x2,i\Si_d-^,Si_d2)}i^l 



Ri 1^ ^J2i=i HYi', Xi^i\X2,i, Si, Si-di, Si-d2)j 

R2 < ^J2i=i ^(Yi; X2.i\Xi,i, Si, Si-di, Si-d2)j 
^ Ri + R2 < -^^27^1 HYi', Xi i, X2^i\Si, Si^di, Si^d2)- J 
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7^ C lim 11 

n — ^Dn ^ — ^ 



Now, in order to obtain that lim„_j.oo Ti-ri 3 T^, we need to show that 

y i?i + i?2 < ^ X]r=i -^(^«' "'^1:*' "''^s,!!'?!, S'i-di , 'S'i_£i2)- y 

Consider the region TZ, an achievable region is uniquely determined for every fixed joint distribution 

P{u\si)P{xi\si,u)P{x2\si,S2,u). The rate Ri is given by 

Ri < I{Xi;Y\X2,S,Si,S2,U) 

= 5I^('^i)II^("l^i)^(^i!^l^2,5,5i = si,^2,C/ = m). (45) 

In addition, we have 

n 1 ^ 

— ''^I{Yi; Xi^i\X2A, Si, Si-di, Si^d2) = — ^ ^ P{si-di)I{yi', Xi^i\X2,i, Si, Si-di = Si-di, Si-di) 

n ^ 

Vp(Si) V-/(K,;Xi.,|X2.„5„5,_d, (46) 
•■^ — ' ^ — ' n 

Si i—1 

where (a) follows from the fact that the distribution P{si^di) is stationary, therefore P{si^di) = P{si)- For 
every U ~ u and Si = si, if P{U — u\Si ~ si) is rational, i.e., k{u,si)/n, where k{u,si) G N, then we can 
chose k{u,si) terms from {P{xi^i\si^djP{x2.i\si-di, Si-d2)}f=i such that P{xi^i\si-di)P{x2A\si-di, Si-d^) = 
P{xi\si,u)P{x2\si, S2,u). If P{U — u\Si = Si) is irrational, we can get arbitrarily close to P{U — u\Si — si) 
by using longer and longer block lengths. Therefore, using ( l45T l and ( |46] | we have that when n — > oo, for every 
given joint distribution P{u\si)P{xi\si,u)P{x2\si, S2,u), we can choose {P{xi^i\si-di)P{x2,i\si-di, Si-d2)}i'=i 
such that 

1 " 

lim -y^I{Y,■,Xl,^\X2.^,S,,S^-d„S^-d2) = I {Xi;Y\X2, S , Si , S2, U) . 

i=l 

By using the same argument for R2 and for Ri + R2, we get that for every given joint distribution 

P{u\si)P{xi\si,u)P{x2\si,S2,u), we can chose {P{xi,i\si^di)P{x2,z\st-di, St-d2)}7=i such that the following 
equalities hold simultaneously, 

1 " 

lim -yI{Y,■,Xl^,\X2,^,S^,S,-d„S,-d2) = I{Xi;Y\X2,S,Si,S2,U), (47) 

i=l 
1 " 

lim -yI{Y,■,X2.^\Xl,„S^,S,-d„S,-d2) - I{X2;Y\Xi,S,Si,S2,U), (48) 

2—1 

1 

lim -yI{Y,;Xl,,,X2.^\S,,S,-d,,S^-d,) - /(Xi, X2; ^i, ^2, /7). (49) 



n— >^oo 77, 
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Using equations ( |47] i. ( |48] l. and ( |49] l. we obtain 

lim Tin 2 7^. (50) 

In order to complete the proof, we need to show that lim^^oo T^n ^ We have that, 

^ Ri<iY.7=iH{Y^\Xl,Y--\S')-H{Y,\X^,,,X2.,,S,), ^ 

^ Ri+R2<iYJl=iH{Y,\Y'-\S')-H{Y,\X^^,,X2.„Si). j 



Tin = [j 



Consider the rate 

1 " 

Ri < -y^H{Y,\XlY'^-\S')-HiY,\Xi^,,X2,,,Si) 

i=l 
n 

< -y^H{Y,\X2,^,S,,S,.d2,S'-''') - H{Y,\X^,,,X2,^,S^) 
i=l 

1 " 

— / I{Yi; Xi i\X2,i, Si, Si-d2T S'' ^). 

n. ^ — ^ 



n 
i=i 



We can bound R2 and Ri + i?2 in a similar way. Hence we get 

Ri ^ ^J27=i HYi] Xi^i\X2^i, Si, Si-d2i s^'^'^^), ^ 
^2 < -^^27=1 -^O^i' '^2,i\Xi,i, Si, Si^d2i S''^'^^) 

y Rl+ R2 < ^EtlHY^;Xl,,,X2,^\S^,S,^d2,S''''')■ ) 

Now, consider the joint distribution P(si, Si_d2: ■s*"''^ , X2,i, j/i), 



7^„ c y 

{P(a;i,i|a:i-\s'-<*i)P(a;2.i|a;'-\s'-'i2)}J'^j 



;5i) 



where (a) follows from the fact that, 

J2 ^(M2,CS;}k'"'Ss.-d2)^(a^2,.|s^'"%M2) 



M2,S, 



-d2-l 
-dl + l 



Note that Pi, P2, and Pi + P2 are uniquely determined by the joint distribution 

{P(si,s^_d2,s*"''i,xi,i,X2,i,y.0}r=r ^" *^ distribution P(s^, s^.^^ , s*"''i , X2,i, y^)' we conti'ol 

only P(a;i^i|s*~'^i)P(x2,i|s*"'*S Si-dJ, since the distributions P(si, Si_d2, s*"'*') and P(j/i|xi,i, a;2,i, Si) are 
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Tin C U 



W 



determined by the channel transition probabihty. Hence, 

y Ri + R2 < ji^^^i I{Yi; Xi^i, X2A\Si, Si-dn Si-d2^ '^^ j 

where W = |P(a;i.i|s*^'^i)P(a;2.i|s*~''% Si-^j)}" In the same way as we did in the proof of the converse ( 
Section HVl equation iT%). we can rewrite these equations with the new variable Q, where Q = i ^ {1,2, ...,n} 
with probability i. Furthermore, we denote Xi = Xi^q,X2 = X2,q,Y = Yq,S = Sq, Si = SQ-dt,S2 = S'g^rfj, 
and U = (S"2"''i~\(5). Hence we derive that. 



7^„ c y 

P(u\si)P{xi\sx,u)P{x2\sx,S2,u) 



Ri<I{Xi;Y\X2,S,Si,S2,U) 
R2<I{X2;Y\Xi,S, Si,S2,U) 
\ Ri + R2 < IiXi,X2;Y\S,SuS2,U), J 



(52) 



Which completes the alternative proof of Theorem [T] 



VII. EXAMPLES 

In this section we apply the general results of Section |III| to obtain the capacity region for a finite-state Gaussian 
MAC, and for the finite-state multiple-access fading channel. We derive optimization problems on the power 
allocation that maximizes the capacity region for these channels. This power allocation would be the optimal 
power control policy for maximizing throughput in the presence of feedback delay. 



A. Capacity Region for a Finite State Additive Gaussian MAC 

We now apply Theorem [T] to compute the capacity region of a power-constrained FS additive Gaussian noise 
(AGN) MAC, and illustrate the effect of the delayed CSI on the capacity region. For a finite state AGN MAC the 
channel output Yi at time i, given the channel inputs Xi^i,X2,i, is given by 

Y = Xl,,+X2,^ + Ns^, (53) 

where Ns^ is a zero-mean Gaussian random variable with variance depending on the state Si of the channel at 
time i. In addition to the channel output Yi the receiver has accesses to the state Si. The receiver feeds back the 
CSI to the transmitters through a noiseless feedback channel. The CSI from the receiver is received at transmitter 
1 and transmitter 2 after a time delays of di , d2 symbol durations, respectively. The state process is assumed to be 
Markov with steady state distribution 7r(s) and one step transition matrix K. It is clear that the finite state AGN is 
an FSMC. While the capacity region formula derived in Section |III] (Theorem [T]) was for finite inputs and output 
alphabets, the result can be generalized to continuous alphabets with inputs constraints. First, we apply only the 
sum rate formula to expUcitly determine the sum rate of the finite state Markov AGN MAC with transmitters power 
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constraints Vi and 1^2- 

Ri+R2< max I{Xi,X2]Y\S,Si,S2,U), (54) 

p(u\si)p(xi\si,u)p(x2\si,S2,u) 

subject to the power constraints, 

Y,TT{h)Y,P{u\h)E[Xl\hM<Vi, (55) 

Si u 

Y,TT{si)Y,P{s2\si)Y,P{u\si)E[Xl\~SirS2M<'P2. (56) 

Si S2 « 



To compute the maximum sum rate explicitly, we have to first determine the distributions P{xi\si^u) and 
P{x2\si,S2,u) for each 5*1, 5*2, and U. Suppose 'Pi(si,u) , 'P2(si,S2,u) is the power allocated to states (si,S2) 
and u. Therefore the sum rate, 

/(Xi, X2; Y\S, Si, S2, U) = ^(^1) ^(^2|Si) J2 ^(^1^2) PMrsi)I{Xi,X2; r|s, Si, S2, u) 

Si S2 S U 

Si S2 S U 

■K{h(Xi + X2 + Ns\s,~si,~S2,u) - h{Ns\s)) 

Si S2 S U 

1 ( E[{Xi+X2+N,f\s,~sirs2,u] 

X - lor 



2 "V m^k] 



X x log IH ^ 



2 

1^ r ^Y-prr ^Y-PM~ ^1 ^ ^ ^i(!i)+^2(£i^\ 

where 

(a) follows from the fact that Ng is independent of Si,S2,U given S. 

(b) follows from the fact that Gaussian distribution has the largest entropy for a given variance. 

(c) follows from the fact that Xi, X2 are independent of Ng and independent of each other given S, 81,82, and 
U. Furthermore, we denote 'Pi(Si) = i?[X^|s,si], and 'P2{si,S2,u) — i?[X||si, S27 w]. 

(d) follows from Jensen's inequality. 

Furthermore, we can achieve ( l57b if we choose Xi{si,u), to be zero-mean Gaussian with variance Vi{si), and 
-^2(51, si, u) to be zero-mean Gaussian with variance 7^2(51, S2), both independent of Ns and independent of each 
other We now have the following result. For an FSM AGN MAC with average power constraints Pi and P2 and 
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CSI at the transmitters with delays di and c?2. 



Pl(si),T'2(si,S2) ^ 



Si S2 



xlog(l + ^lM±^(!ilM, (58) 



subject to the power constraints, 

Y.ir[si)Vi{si)<Vi, (59) 

SI 

^7r(5i)^P(S2|Si)p2(Si,S2) <P2. (60) 

SI S2 

Similarly, we can derive maximization on Ri and R2, for Ri. 

R,^ max iV^(Si)Vif'^^-'^^(52,5i)Vif'^^(s,S2)logfl + ^i^V (61) 
subject to the power constraint, 

^7r(Si)7'i(Si) <Pi, (62) 

si 

and for R2: 



-^Tt'- ^^T^di-d2l~ ~ \^T^d^(„ 7, M„„Yl I "^2(51,52) 

Si 

subject to the power constraint. 



max - ^ Ah)Yl K''-'Hh,Si) E ^'H^' ^^2) log 1 + ^lifllM , (63) 



^4si)^P(S2|Sl)P2(5l,S2) <P2. (64) 

Sl S2 

It is important to mention that in the general case the three equations ( ISST l. dMT l, and ( |63] ) do not achieve their 
maximum in the same distribution, i.e., not in the same power allocation. In the same way we can derive the 
maximization problem for two special cases. The first case is d = di = d2, since the delays are the same we denote 
S = Sl =5*2, hence we have, 

Ri = max i V ^(S) ^ K'^is, s) log (l + ^) , (65) 

•Pl s) 2 ^ ^ V (Tg J 

s s \ ^ / 

^2 = i 5] ^(.?) ^'(«' ^1 log fl + ^) ' (66) 

■P2(s) 2 ^ ^ V CTs / 

i?i + i?2 = max ^^^^ i Y E ^'(^' ^1 log (1 + ^i^^^^^) , (67) 
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subject to the power constraints, 

Y.7r{S)Vi{S)<Pi, (68) 

S 

^^3)^2(5) <r2. (69) 
s 

The second case is d2 < di = oo, let us denote d = d2 and S = §2, therefore we have, 

^2 = Hiax ^ E E ^'(^' log fl + ^) ' (71) 
i?i + i?2 = max i ^ ^(S) ^ /^'^(s, 5) log f 1 + ^LtZ^if) , (72) 
subject to the power constraints, 

Y,T^{S)V2{1) < V2. (73) 

s 

Now to gain some intuition on the capacity region, we consider the case when there are only two states. At any 
given time i the channel is in one of two possible states G or B. In the good state G, the channel is "good" and the 
noise variance is CTq, and in the bad state B, the channel is "bad" and the noise variance is u^, where cr^ > cr%- 
The state process is specified by the transition probabilities given by 

P{G\B) = g, 
P{B\G) = b. 

The state process is illustrated in Fig. |4] the steady state distribution of the Markov chain is given by 



5 + 6' 

AG) = ' 
b + g 



By solving the optimization problems dSST l, d&T] ). and ( 172] ) for the two state example, we present the maximum 
sum rate versus delay plot in Fig. |5] which shows the effect of the CSl delay on the sum rate for Vi = 10, V2 = 
10, CTq = l,ag — 100, g = 0.1, b = 0.1. The details on solving the optimization problem for the two state example 
are presented in Appendix 
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Fig. 4: Two-state AGN channel 
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Sum rate vs. delay di (asymmetrical delay d2 = 0) 
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Fig. 5: The sum rate versus delay for the two state channel: (a) d2 < di — oo, (b) c?i — d2, (c) — d2 < di. 



Perhaps it seems that the improvement in the sum rate due to CSI is small, however, we should remember that 
when we encode large blocks, this small improvement in the sum rate can be of importance. In addition, this 



27 



improvement in the sum rate due to CSI is for the specific example of two states AGN-MAC. In Fig. |6]we present 
the power control policy versus delay that achieves the maximum sum rates for the three cases. 
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Fig. 6: The power control policy versus delay that achieves the maximum sum rate: (a) ^2 < c?i = oo, (b) di = d-z, 
(c) = < di. 
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Now, we present the capacity rate region for the two states AGN-MAC in the asymmetrical case di > d2 by 
solving numerically the following optimization problem for different values of a, 



max aRi + R2, 

-Ri,-R2 



subject to the constraints, 

i?i < ^ E E K''^'^ h) E log (1 + ^) , 

Sl S2 S \ S / 

^2 < ^ E E K''-'' (^^2' E '^2) i°g (1 + ^^^) ' 
i?, + i?. < i E -(^^1) E A'^-'^ns^, so E K'^Hs, h) log (1 + 

Sl S2 S \ S 

E ^(^1)^1(^1) ^^1' 

Sl 

E^(si)E^(^2|Sl)7'2(Sl,S2) <7'2. 



(74) 

(75) 

(76) 

(77) 
(78) 
(79) 



In order to solve the optimization problem (|74] | we used CVX, a package for specifying and solving convex 
optimization problems ll26l . The capacity rate region for d2 — and different values of di are presented in Fig.|2l 
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Fig. 7: Capacity rate region for the two states AGN-MAC - asymmetrical case d2 = 0. 



Similarly, we solve the optimization problem for the symmetrical case di = d2, and for the case that transmitter 1 
does not have any CSI, i.e., d2 < di — 00. The rate regions are illustrated in Fig. [8l and Fig. |9l respectively. 
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Capacity rate region (d2 < di = oo) 
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Fig. 9: Capacity rate region for the two states AGN-MAC - Transmitter 1 does not have the CSI d2 < di — go. 
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B. Capacity Region for a Finite State Multiple-Access Fading Channel 

We apply Theorem [T] to compute the capacity region of a power constrained FS Muhiple- Access fading channel, 
and illustrate the effect of the delayed CSI on the capacity region. Consider the discrete-time multiple -access 
Gaussian channel. 



Y, = hi{si)Xi.i + h2{s^)X2,^ + Ns,, 



(80) 



where Xi^i,X2^i are the transmitted waveform, and hi{si), h2{si) are the fading process of the users. The terms 
hi{si), h2{si) are deterministic functions of s;. The noise Ns\ is a zero-mean Gaussian random variable with 
variance depending on the state of the channel at time i. Furthermore, the users are subject to the average transmitter 
power constraints of Pi, and 1^2- The state process is assumed to be Markov with steady state distribution 7r(s) and 
one step transition matrix K, as described in Section|II] The FS Multiple-Access fading channel is illustrated in Fig. 
[TOl We apply the capacity region formula to explicitly determine the capacity region of the multiple-access Gaussian 



Xo 



hiis) 







3^ 











h2is) 

Fig. 10: The fading channel. 



fading channel with transmitters power constraints Vi and V2- In a similar way to the FSM Additive Gaussian 
MAC, it can be shown that the capacity achieving distributions are Xi{si,u) zero-mean Gaussian with variance 
7^i(si), and X2{si,si,u) zero-mean Gaussian with variance 7^2(51,52), both independent of Ng and independent 
of each other. We derive the following optimization problem, 

i?i = max 1 Y: ^(gi) E K"'-'' (g^- E fl + ^'^T' ^''^ ) ^ (81) 

Si S2 S ^ 1 / 

R. = max 1 ^ niS,) Y: K'^^-'^ {h,h) E ^2) log f 1 + ^' ^'^) ) , (82) 

Si S2 S 

R1+R2 = ^ max 1^^(50 E^''^''(^~2,5i)E^''(«'*~2) 

Vl(si).V2isi,S2) 2 ^ ^ ^ 

Si S2 S 

( hi(s)^Pi(Si) +h2(s)^V2(h,S2)\ 
X log h + ^' ' 2^J^JJ\ ^ ^g2) 
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subject to the power constraints. 



(84) 



(85) 



In the same way, we can derive the optimization problem for the symmetrical case di — d2, and for the case that 
transmitter 1 does not have any CSI, i.e., d2 < di = oo. Let us solve the optimization problems for the following 
FSM multiple-access fading channel examples: 

1) Example 1 (AGN switch channel): Consider the discrete-time multiple-access Gaussian two state switch 
channel as described in Fig. [TT] We solve the optimization problem: max(Q;i?i + R2), for different values of a in 
the same way we did in the FS additive Gaussian noise (AGN) MAC example. In Fig. [T2][T3] and [14] we present 
the capacity rate region for Vi = 10, 7^2 = 10, cr| = 1, cr| = 10, g = 0.1, b = 0.1, hi{G) = 1, hi{B) = 0, 
^2(G) — 0, h2{B) = 1, in the following cases: asymmetrical, symmetrical, and the case that transmitter 1 does not 
have any CSI. 



Ng 





Nb 



Fig. 1 1 : The channel behaves like a switch, at any given time i the channel is in one of two possible states G or 
B, where cr^ > a^. The state process is illustrated in Fig. |4] 
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Capacity rate region (^2 — 0) 
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Fig. 12: Capacity rate region for the two states switch channel - asymmetrical case d2 = 0. 



Capacity rate region (di = ^2) 
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Fig. 13: Capacity rate region for the two states switch channel - symmetrical case d = di = d2- 



As one can see from Fig. [12] El and [14] the capacity rate region shape indicates that the users do not interrupt 
each other, so each of them can transmit at its own maximal rate independently of the other user This makes 
perfect sense, since the transmission of each one of them is dependent only on the switch and not on the other's 
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Capacity rate region (d2 < di = oo) 
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Fig. 14: Capacity rate region for the two states switch channel - Transmitter 1 does not have the CSI d2 < di — oo. 



transmission. 

2) Example 2 (Multiple-Access fading channel): Consider the power constrained FS Multiple- Access fading 
channel as illustrated in Fig(lO]with only two states: S = I, S = 2. The state process is Markov and illustrated in 
Fig. |4] with a slight change, instead of denoting the states "good" and "bad" we use S* = 1, 5 = 2. We solve the 
optimization problem: ma.x{aRi +R2), for different values of a in the same way we did before. In Fig. [15] [16] and 
[T7]we present the capacity rate region for Vi = 10, V2 = 10, cr^^j = cr^^2 = 1> 3 = 0-1, b = 0.1, hi{s = 1) = 1, 
hi{s = 2) = 0.5, 112(3 = 1) = 0.5, h2(s = 2) = 1. 
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Capacity rate region (^2 = 0) 
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Fig. 15: Capacity rate region for the two states fading channel - asymmetrical case d2 = 0. 



Capacity rate region (di = ^2) 
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Fig. 16: Capacity rate region for the two states fading channel - symmetrical case d — di — d2- 



VIII. SUMMARY 

The requirement for high rates multi-user communications systems is constantly increasing, so it becomes essential 
to achieve capacity by deriving the benefit from the channel structure. Motivated by this we studied the problem 
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Capacity rate region (d2 < di = oo) 
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Fig. 17: Capacity rate region for the two states fading channel - Transmitter 1 does not have the CSl d2 < di = oo. 



of finite-state MAC, where the channel state is a Markov process, the transmitters have access to delayed state 
information, and channel state information is available at the receiver. The delays of the channel state information 
is assumed to be asymmetric at the transmitters. We obtained a computable characterization of the capacity region 
for this channel. We provide the upper bound on the capacity region and the proof of the achievability, which is 
based on multiplexing coding. In addition, we provide alternative proof for the capacity region. The alternative proof 
is based on a multi-letter expression for the capacity region of FS-MAC with time-invariant feedback. Then we 
apply the result to derive power control strategies to maximize the capacity region for finite-state additive Gaussian 
MAC, and for the multiple -access fading channel. The results and the insight in this paper are an intermediate step 
toward understanding network communication with delayed state information. 



Appendix A 

CARDINALITY BOUND OF THE AUXILIARY RANDOM VARIABLE U 

Let us prove now the cardinality bound for Theorem 1, which is derived directly from the Fenchel - Eggleston - 
Caratheodry theory |27|. Let us denote the set Z to be Z = Afi x x 5 x 5i x <S2, let V{Z) be the set of PMFs 
on Z, and let V{Z\U) C V{Z) be a collection of PMFs p{z\u) on Z indexed hy u <E lA. Let gj, j — 1, . . . , k 
be continues functions on V{Z\U). Then, for any U ^ Fjj{u), there exists a finite random variables U' ^ p{u') 
taking at most k values in U such that 



gj{pz\u{z\U)) = / gj{pz\u{z\u))dF{u) (86) 



u 
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= ^93{Pz\u{z\u'))p{u). (87) 

u' 

Let us denote, 

gi{p{z\u)) = I{XuY\X2,S,Si,S2,U ^u) (88) 
g2{p{z\u)) ^ I{X2;Y\Xi,S,Si,S2,U ^u) (89) 
gsipizlu)) = I{Xi,X2;Y\S,Si,S2,U = u), (90) 

then, by using the given technique, we can see that \U\ < 3. By utilizing the same technique, and similar 
considerations, we can bound the cardinality of the auxiliary variable in Theorem 2 to be |W| < 3 and the cardinality 
of the auxiUary variable in Theorem 3 to be |Q| < 3. 

Appendix B 
PROOF OF THEOREM[3] 

The proof of Theorem |3] is similar to the case where the CSI is available at the decoder and asymmetrical delayed 
CSI is available at the encoders with delays di and d2 (di > ^2), only now di — > 00. We give here the proof of the 
converse, and only a brief outline of the achievability proof. Since only encoder 2 has the CSI we denote d = d2 
and S = 5*2. 

A. Converse Theorem \3\ 

Given an achievable rate (i?i,i?2) we need to show that there exists joint distribution of the form 

P{s,s)P{q)P{xi\q)P{x2\s,q)P{y\xi,X2,s) such that, 

Ri<I{Xi;Y\X2,S,S,Q), 
R2<I{X2;Y\Xi,S,S,Q), 
Ri+R2<I{Xi,X2;Y\S,S,Q), 

where Q is an random variable with a cardinality bound |Q| < 3. The proof of the cardinaUty bound is similar to 
the proof in Appendix lAl Since (i?i;i?2) is an achievable pair-rate, there exists a code (n, 2"^^ , 2"^^ , d) with a 

(n) 

probability of error Pe arbitrarily small. By Fano's inequality, 

if(Mi,M2|y", 5") < niRi + i?2)Pi") + i/(Pi")) = nsn, (91) 

(n) 

and it is clear that £„ — > as Pe —5- 00. Then we have 



H{Mi\Y",S") < H{Mi,M2\Y",S") < e, 
H{M2\Y'\S'') < i7(Mi,M2|F",S'") < e, 



(92) 
(93) 
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We can now bound the rate i?i as 



nRi = H{Mi) 

= i?(Mi) + if(Mi|y",s'")-if(Mi|y",s'") 

< /(Mi;y",5")+ne„ 
/(Mi;y"|5")+/(Mi;5")+n£„ 
/(Mi;r"|5") + n5„ 
/(Xi";F"|5")+n£„ 

= - if(Xi"|y", 5") + n£„ 

H{X^\X^, 5") - i?(Xl'|r\ 5") + ne„ 
<^ 5") - i?(Xi"|r", + ne„ 

= I{X^;Y"\XS,S")+nen 
= if(y"|X^, 5") - iI(F"|Xi",X2", 5") + nsn 

n 

= Y,H{Yi\Y'-\X^,S^)-H{Yi\Y'-\X^,X^,S'-) + nen 
(5) ^ 

< ^ i?(yi|X2,i, Si, Si-d) - H{Yi\Y'-\X^, X^, S") + nsn 



(4) 



-f^(^i|^2,i, iSi, /Si-d) — H{Yi\Xi^i, X2,i, Si, Si-d) + nSn 
n 

/(V,; Xi^i 1X2,1, 'S'i, + n£„, 



i=l 

where 

(a) follows from Fano's inequaUty. 

(b) follows from chain rule. 

(c) follows from the fact that Mi and S*" are independent. 

(d) follows from the fact that X^ is a deterministic function of (Mi, 5") and the Markov chain (Mi, 5") — 
(Xi",S'") - F". 

(e) follows from the fact that X" and M2 are independent, and the fact that X2 is a deterministic function of 
(M2, S*"). Therefore, X" and X2 are independent given 5" . 

(f) and (g) follow from the fact that conditioning reduces entropy. 

(h) follows from the fact that the channel output at time i depends only on the state Si and the the inputs Xi^i and 
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Hence, we have 



1 " 

Ri<-y I{Y,■,Xl,,\X2,^, 5„ S^^d) + (94) 

n. ^ — ^ 



n 
i=i 



Similarly, we have 



and the sum rate. 



i?2 < - V /(F.; X2,.|Xi,„ S,,S,-d) + Sn; (95) 
n ^ — ' 

i=l 



1 " 

Rl+R2< - y^I{Y,■,Xl,,,X2.^\S^,S^-d)+en■ 

71 ' * 



n 

i=l 



(96) 



The expressions in ( |94l l. ( |95] l. and ( |96] l are the average of the mutual informations calculated at the empirical 
distribution in column i of the codebook. We can rewrite these equations with the new variable Q, where Q = i € 
{1, 2, ...,n} with probability i. The equations become 

1 " 

Rl < — / I{Yi', Xi i\X2^i, Si, Si-d) + En 
i—l 

1 " 

= I{Yq; Xi^q\X2,q, Sq, Sg-d, Q = i) + £n 



n ■ 

i=l 



= I{YQ;X,^Q\X2,Q,SQ,SQ-d,Q) + en. (97) 

Now let us denote Xi ^ Xi,q, = X2,q,Y ^Yq,S^Sq, and S ^ Sg-d- 
we have 

Rl < I{Xi;Y\X2,S,S,Q) + er,, 
R2 < I{X2;Y\Xi,S,S,Q) + en, 
R1+R2 < I{Xi,X2;Y\S,S,Q) + er,. 

Now we need to show the following Markov relations hold: 

1) Piq\s,S) = Piq) ■ 

2) Pixi\s,s,q) = Pixi\q). 

3) P{x2\xi,s,s,q) ^ P{x2\s,q). 

4) P{y\xi,X2,s,s,q) = P{y\xi,X2, s). 

We prove the above using the following claims: 

1) follows from the fact that Q and the state process 5" are independent. 

2) follows from the fact that Xi i — /i.i(Afi) and that Mi and S"" are independent. 

3) follows from the fact that M2 and {Mi,S") are independent, and the fact that state process is a Markov 
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chain, Hence 

Therefore, we have the Markov chain (M2,5*"'') - S^-d - {Mi, Si). Since Xi^, = /i,i(Afi) and X2,i = 
f2,i{M2, S^~'^), where /i,i,/2,i are deterministic functions, we get the following Markov chain, 

X2,^ - (M2, 5*-'*) - S^-d - (Ml, 50 - Xi.,. (98) 

Therefore, 

Pix2,i\xi,i,Si,Si^d) = -P(2^2,j|s-i-d)- 

Since this is true for all i, 

P{x2,q\xi,q,Sq,Sq-d,q) = P [x2.,q\Sq-d, q) ■ 

We have P{x2\xi, s, s, q) — P{x2\s, q). 
4) follows from the fact that the channel output at time i depends only on the state 5^ and the the inputs Xi^i 
and X2.i. 

(n) 

Hence, taking the limit as n ^ cx), pr' 0, we have the following converse: 

Ri < I{XuY\X2,S,S,Q), 
R2 < IiX2:Y\Xi,S,S,Q), 
R1 + R2 < I{XuX2;Y\S,S,Q), 

for some choice of joint distribution P{s,s)P{q)P{xi\q)P{x2\s,q)P{y\xi,X2,s) and for some choice of random 
variable Q defined on |Q| < 3. This completes the proof of the converse. 

B. Achievability Theorem\3\ 

To prove the achievability of the capacity region, we need to show that for a fixed P{xi)P{x2\s) and (i?i,i?2) 
that satisfy, 

Ri < I{Xi;Y\X2,S,S), 
R2 < I{X2;Y\Xi,S,S), 
R1+R2 < I{Xi,X2;Y\S,S), 

there exists a sequence of (n, 2"-Ri , 2"-"^ , d) codes where pj"^ ^ as n 00. Without loss of generality we 
assume that the finite-state space S = {1, 2, k}, and that the steady state probability tt{1) > for all / G S. 
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Encoder 1: consttuct 2"^^ independent codewords where i e {l,2, ..,2"^^} of length n, generate each 

symbol i.i.d., X'^{i) ^ nr=i ^(^1,0- 

Encoder 2: construct k codebooks C| (where the subscript is for Encoder 2) for all 5 6 5, when in each 
codebook C| there are 2"^''')^^*^^) codewords, where 712(5) — {P{S — s) — e')n, for e' > 0. Every codeword 
C|(i) where i £ {1, 2, 2"^('*'^^('*)} has a length of n2(s) symbols. Each codeword from the C| codebook is 
built X2 ~ i.i.d. = .5) (where the subscript is for Encoder 2). A message M2 is chosen according to a 

uniform distribution Pr(Af2 = TO2) = 2^"-'^^, m2 G {l,2, ...,2"^^Y Every message m2 is mapped into k sub 
messages 1^2(7712) = {V2^(m2), V2^(to2 ),•■•, V"2'^ ('712)} (one message from each codebook). Hence, every message 
r7i2 is specified by a fc dimensional vector. For a fix block length n, let Ng be the number of times during the n 
symbols for which the feedback information at encoder 2 regarding the channel state is S* = s. Every time that the 
delayed CSI is S* = s, encoder 2 sends the next symbol from C| codebook. Since Ng is not necessarily equivalent 
to n2{s), an error is declared if Ng < n2{s), and the code is zero-filled if Ng > ^2(5). Therefore we can send 
total of 2"-«2 = 2^^<^s"2is)R2{s) messages. 

Decoding: we use successive decoding, similar to the decoding in section |V] It can be shown that the probability 
of error, conditioned on a particular codeword being sent, goes to zero if the conditions of the following are met: 

Ri < I{Xi;Y\X2,S,S), 
R2 < I{X2;Y\Xi,S,S), 
R1+R2 < I{Xi,X2;Y\S,S). 

The above bound shows that the average probability of error, which by symmetry is equal to the probability for an 
individual pair of codewords {1711,1112), averaged over all choices of codebooks in the random code construction, 
is arbitrarily small. Hence there exists at least one code (n, 2"^^, 2"^^, d) with an arbitrarily small probability of 
error To complete the proof we use time-sharing to allow any (i?i, R2) in the convex hull to be achieved. 

Appendix C 

DETERMINATION OF THE TWO-STATE MAC CAPACITY REGION 
For simplicity we give here the solution to the constrained optimization only for the symmetrical case, i.e., both 

CSI delays are the same (di — (^2), the solution of the other cases are obtained in a similar way. The optimization 

problem is: 

«■ + * = 5 ? ^' (' + ^^^^^) . m 

subject to the power constraints, 

^^(S)Pi(S) <Pi, (100) 

s 

Y.Tr{s)V2{s) <V2, (101) 
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Vi{s) > Vs, 
^2(5) > Vs. 



(102) 
(103) 



The solution can be obtained by the Lagrange multiplier method. Since the objective function is monotonically 
increasing with respect Vi, and P2, it follows that the maximum is achieved when 



s 



(104) 
(105) 



Since log is a concave function, and 7r(s), K'^{s, s) > 0. We get that objective function is concave in both variables 
Pi{s), and 7^2(5)- Also the constraints functions ( 11041 ). and ( I105l l are affine. So we can use the Kuhn-Tucker 
conditions Il28l Chapter 5.3.3] as a sufficient conditions to solve the optimization problem. Application of the 
Kuhn-Tucker conditions gives the following conditions of optimality: 



-y- 

S 



<Vl , VSi e {si,S2,..,Sfe}, 
<l^2 ; Vsi e {si,S2,--,Sfc}, 



(106) 

(107) 
(108) 
(109) 



with equality in ( IIO6I 1 whenever 'Pl{si) > 0, and equality in ( 1107b whenever ^2(^1) ^ 0- for '^^o state Gaussian 
MAC example in Section IVII-AI we have. 



(110) 



Now the solution to the constrained optimization problem is obtained by finding Vl{si), and 7^|(si) that satisfy 
the Kuhn-Tucker conditions. For simplicity, in order to solve the optimization problem we used CVX, a package 
for specifying and solving convex optimization problems [26]. 
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